Structure of pauses in speech in the context of speaker verification and classification of speech type

نویسندگان

  • Magdalena Igras
  • Bartosz Ziólko
  • Piotr Zelasko
  • Marcin Witkowski
چکیده

Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition were described. The usage of three main types of acoustic pauses (silent, filled and breath pauses) and syntactic pauses (punctuation marks in speech transcripts) was investigated quantitatively in three types of spontaneous speech (presentations, simultaneous interpretation and radio interviews) and read speech (audio books). Selected parameters of pauses extracted for each speaker separately or for speaker groups were examined statistically to verify usefulness of information on pauses for speaker recognition and speaker profile estimation. Quantity and duration of filled pauses, audible breaths, and correlation between the temporal structure of speech and the syntax structure of the spoken language were the features which characterize speakers most. The experiment of using pauses in speaker biometry system (using Universal Background Model and i-vectors) resulted in 30 % equal error rate. Including pause-related features to the baseline Mel-frequency cepstral coefficient system has not significantly improved its performance. In the experiment with automatic recognition of three types of spontaneous speech, we achieved 78 % accuracy, using GMM classifier. Silent pause-related features allowed distinguishing between read and spontaneous speech by extreme gradient boosting with 75 % accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Gender and Age Classification in Speech Signals

Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2016  شماره 

صفحات  -

تاریخ انتشار 2016